This is an automated email from the ASF dual-hosted git repository. jmark99 pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/accumulo-examples.git
The following commit(s) were added to refs/heads/main by this push: new b34ed03 Update documentation for Sample and Shard examples (#101) b34ed03 is described below commit b34ed034d339dc8c5d6f8a77cd8c4b4371f22f83 Author: Mark Owens <jmar...@apache.org> AuthorDate: Mon Apr 11 14:24:11 2022 -0400 Update documentation for Sample and Shard examples (#101) Update Sample.md to indicate shard example should be run before proceeding with sampling portion of the shard example. Added namespace to name of shard table in Sample.md. Modified output of shard.Query output from Shard example to reflect recent changes in source code. --- docs/sample.md | 8 ++++++-- docs/shard.md | 28 +++++++++++++++++++++++----- 2 files changed, 29 insertions(+), 7 deletions(-) diff --git a/docs/sample.md b/docs/sample.md index bdf5be6..33c9e81 100644 --- a/docs/sample.md +++ b/docs/sample.md @@ -149,8 +149,10 @@ configuration for sample scan to work. Shard Sampling Example ---------------------- +Note: Before continuing, you need to complete the Shard example, located [here][shard]. + The Shard example shows how to index and search files using Accumulo. That -example indexes documents into a table named `shard`. The indexing scheme used +example indexes documents into a table named `examples.shard`. The indexing scheme used in that example places the document name in the column qualifier. A useful sample of this indexing scheme should contain all data for any document in the sample. To accomplish this, the following commands build a sample for the @@ -184,9 +186,11 @@ Another way sample data could be used with the shard example is with a specialized iterator. In the examples source code there is an iterator named CutoffIntersectingIterator. This iterator first checks how many documents are found in the sample data. If too many documents are found in the sample data, -then it returns nothing. Otherwise it proceeds to query the full data set. +then it returns nothing. Otherwise, it proceeds to query the full data set. To experiment with this iterator, use the following command. The `--sampleCutoff` option below will cause the query to return nothing if based on the sample it appears a query would return more than 1000 documents. $ ./bin/runex shard.Query --sampleCutoff 1000 -t examples.shard import int | fgrep '.java' | wc + +[shard]: shard.md diff --git a/docs/shard.md b/docs/shard.md index 97a9d40..d779c6f 100644 --- a/docs/shard.md +++ b/docs/shard.md @@ -37,11 +37,29 @@ After creating the tables, index some files. The following command indexes all o The following command queries the index to find all files containing 'foo' and 'bar'. $ ./bin/runex shard.Query -t examples.shard foo bar - /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/security/ColumnVisibilityTest.java - /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/client/mock/MockConnectorTest.java - /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/security/VisibilityEvaluatorTest.java - /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/data/KeyExtentTest.java - /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/iterators/WholeRowIteratorTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/replication/ReplicationTargetTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/client/admin/NewTableConfigurationTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/spi/balancer/HostRegexTableLoadBalancerTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/data/KeyExtentTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/iterators/user/WholeRowIteratorTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/iterators/user/WholeColumnFamilyIteratorTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/data/KeyBuilderTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/security/ColumnVisibilityTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/conf/IterConfigUtilTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/summary/SummaryCollectionTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/clientImpl/TableOperationsHelperTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/clientImpl/mapreduce/BatchInputSplitTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/spi/balancer/HostRegexTableLoadBalancerReconfigurationTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/client/IteratorSettingTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/client/mapred/RangeInputSplitTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/iterators/user/TransformingIteratorTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/spi/balancer/BaseHostRegexTableLoadBalancerTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/conf/HadoopCredentialProviderTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/client/mapreduce/AccumuloInputFormatTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/replication/ReplicationSchemaTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/client/mapreduce/RangeInputSplitTest.java + /path/to/accumulo/core/src/test/java/org/apache/accumulo/core/security/VisibilityEvaluatorTest.java + In order to run ContinuousQuery, we need to run Reverse.java to populate the `examples.doc2term` table.