[jira] [Commented] (ASTERIXDB-1700) edit-distance-check on the fields with the 2-gram and the 3-gram index generates a null pointer exception.

2016-10-21 Thread Chen Li (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596670#comment-15596670
 ] 

Chen Li commented on ASTERIXDB-1700:


Glad to know issue 1) is fixed.

> edit-distance-check on the fields with the 2-gram and the 3-gram index 
> generates a null pointer exception.
> --
>
> Key: ASTERIXDB-1700
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1700
> Project: Apache AsterixDB
>  Issue Type: Bug
>Reporter: Taewoo Kim
>Assignee: Taewoo Kim
>
> If there multiple indexes on the same field, we use intersect operator to 
> integrate the result from each index. In the following AQL query, we have two 
> n-gram indexes on the same field. And, the null pointer exception happens.
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hyracks.api.job.JobSpecification.getInputConnectorDescriptor(JobSpecification.java:192)
>   at 
> org.apache.hyracks.api.job.JobSpecification.getInputConnectorDescriptor(JobSpecification.java:188)
>   at 
> org.apache.hyracks.api.client.impl.JobActivityGraphBuilder.addSourceEdge(JobActivityGraphBuilder.java:81)
>   at 
> org.apache.hyracks.dataflow.std.base.AbstractSingleActivityOperatorDescriptor.contributeActivities(AbstractSingleActivityOperatorDescriptor.java:54)
>   at 
> org.apache.hyracks.api.client.impl.JobSpecificationActivityClusterGraphGeneratorFactory$2.visit(JobSpecificationActivityClusterGraphGeneratorFactory.java:67)
>   at 
> org.apache.hyracks.api.client.impl.PlanUtils.visitOperator(PlanUtils.java:41)
>   at org.apache.hyracks.api.client.impl.PlanUtils.visit(PlanUtils.java:34)
>   at 
> org.apache.hyracks.api.client.impl.JobSpecificationActivityClusterGraphGeneratorFactory.createActivityClusterGraphGenerator(JobSpecificationActivityClusterGraphGeneratorFactory.java:64)
>   at 
> org.apache.hyracks.control.cc.work.JobStartWork.doRun(JobStartWork.java:61)
>   at 
> org.apache.hyracks.control.common.work.SynchronizableWork.run(SynchronizableWork.java:39)
>   at 
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> Oct 19, 2016 7:10:22 PM org.apache.asterix.app.translator.QueryTranslator 
> handleQuery
> {code}
> {code}
> create type DBLPType as closed {
>   id: int64,
>   dblpid: string,
>   title: string,
>   authors: string,
>   misc: string
> }
> create dataset DBLP(DBLPType)
>   primary key id on group1;
> create index ngram2_index on DBLP(authors) type ngram(2);
> create index ngram3_index on DBLP(authors) type ngram(3);
> for $o in dataset('DBLP')
> let $ed := edit-distance-check($o.authors, "Amihay Motro", 1)
> where $ed[0]
> return $o
> {code}
> {code}
> -- DISTRIBUTE_RESULT  |PARTITIONED|
>   exchange
>   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> select (function-call: asterix:get-item, Args:[function-call: 
> asterix:edit-distance-check, Args:[function-call: 
> asterix:field-access-by-index, Args:[%0->$$0, AInt32: {3}], AString: {Amihay 
> Motro}, AInt64: {1}], AInt64: {0}])
> -- STREAM_SELECT  |PARTITIONED|
>   project ([$$0])
>   -- STREAM_PROJECT  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$6, $$0] <- function-call: asterix:index-search, 
> Args:[AString: {DBLP}, AInt32: {0}, AString: {test}, AString: {DBLP}, 
> ABoolean: {false}, ABoolean: {false}, AInt32: {1}, %0->$$9, AInt32: {1}, 
> %0->$$9, TRUE, TRUE, TRUE]
>   -- BTREE_SEARCH  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   intersect ([$$9] <- [[$$9], [$$11]])
>   -- INTERSECT  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   order (ASC, %0->$$9) 
>   -- STABLE_SORT [$$9(ASC)]  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$9] <- function-call: 
> asterix:index-search, Args:[AString: {ngram2_index}, AInt32: {5}, AString: 
> {test}, AString: {DBLP}, ABoolean: {false}, ABoolean: {false}, AInt32: {2}, 
> AInt64: {1}, AInt32: {12}, AInt32: {1}, %0->$$8]
>   -- LENGTH_PARTITIONED_INVERTED_INDEX_SEARCH  
> |PARTITIONED|
> project ([$$8])
> -- STREAM_PROJECT  |PARTITIONED|
>   assign [$$8] <- [%0->$$10]
>   -- ASSIGN  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   replicate
>   -- REPLICATE  |PARTITIONED|
> exchange
>   

[jira] [Commented] (ASTERIXDB-1700) edit-distance-check on the fields with the 2-gram and the 3-gram index generates a null pointer exception.

2016-10-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593861#comment-15593861
 ] 

ASF subversion and git services commented on ASTERIXDB-1700:


Commit 68c8e9befabfa7def67ce3a1cc93dba05966bd15 in asterixdb's branch 
refs/heads/master from [~wangsaeu]
[ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=68c8e9b ]

ASTERIXDB-1700: fixed multiple same type of index application error on the same 
field

 - Fixed an issue that multiple same type of indexes can be applied for the 
same field.
   For this situation, applying only one index will be enough.
   (e.g., 2-gram and 3-gram index on the same field)

Change-Id: I450f3adb20c777d5b9a8f638e010076b9d817942
Reviewed-on: https://asterix-gerrit.ics.uci.edu/1307
Tested-by: Jenkins 
Integration-Tests: Jenkins 
Reviewed-by: Jianfeng Jia 


> edit-distance-check on the fields with the 2-gram and the 3-gram index 
> generates a null pointer exception.
> --
>
> Key: ASTERIXDB-1700
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1700
> Project: Apache AsterixDB
>  Issue Type: Bug
>Reporter: Taewoo Kim
>Assignee: Taewoo Kim
>
> If there multiple indexes on the same field, we use intersect operator to 
> integrate the result from each index. In the following AQL query, we have two 
> n-gram indexes on the same field. And, the null pointer exception happens.
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hyracks.api.job.JobSpecification.getInputConnectorDescriptor(JobSpecification.java:192)
>   at 
> org.apache.hyracks.api.job.JobSpecification.getInputConnectorDescriptor(JobSpecification.java:188)
>   at 
> org.apache.hyracks.api.client.impl.JobActivityGraphBuilder.addSourceEdge(JobActivityGraphBuilder.java:81)
>   at 
> org.apache.hyracks.dataflow.std.base.AbstractSingleActivityOperatorDescriptor.contributeActivities(AbstractSingleActivityOperatorDescriptor.java:54)
>   at 
> org.apache.hyracks.api.client.impl.JobSpecificationActivityClusterGraphGeneratorFactory$2.visit(JobSpecificationActivityClusterGraphGeneratorFactory.java:67)
>   at 
> org.apache.hyracks.api.client.impl.PlanUtils.visitOperator(PlanUtils.java:41)
>   at org.apache.hyracks.api.client.impl.PlanUtils.visit(PlanUtils.java:34)
>   at 
> org.apache.hyracks.api.client.impl.JobSpecificationActivityClusterGraphGeneratorFactory.createActivityClusterGraphGenerator(JobSpecificationActivityClusterGraphGeneratorFactory.java:64)
>   at 
> org.apache.hyracks.control.cc.work.JobStartWork.doRun(JobStartWork.java:61)
>   at 
> org.apache.hyracks.control.common.work.SynchronizableWork.run(SynchronizableWork.java:39)
>   at 
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> Oct 19, 2016 7:10:22 PM org.apache.asterix.app.translator.QueryTranslator 
> handleQuery
> {code}
> {code}
> create type DBLPType as closed {
>   id: int64,
>   dblpid: string,
>   title: string,
>   authors: string,
>   misc: string
> }
> create dataset DBLP(DBLPType)
>   primary key id on group1;
> create index ngram2_index on DBLP(authors) type ngram(2);
> create index ngram3_index on DBLP(authors) type ngram(3);
> for $o in dataset('DBLP')
> let $ed := edit-distance-check($o.authors, "Amihay Motro", 1)
> where $ed[0]
> return $o
> {code}
> {code}
> -- DISTRIBUTE_RESULT  |PARTITIONED|
>   exchange
>   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> select (function-call: asterix:get-item, Args:[function-call: 
> asterix:edit-distance-check, Args:[function-call: 
> asterix:field-access-by-index, Args:[%0->$$0, AInt32: {3}], AString: {Amihay 
> Motro}, AInt64: {1}], AInt64: {0}])
> -- STREAM_SELECT  |PARTITIONED|
>   project ([$$0])
>   -- STREAM_PROJECT  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$6, $$0] <- function-call: asterix:index-search, 
> Args:[AString: {DBLP}, AInt32: {0}, AString: {test}, AString: {DBLP}, 
> ABoolean: {false}, ABoolean: {false}, AInt32: {1}, %0->$$9, AInt32: {1}, 
> %0->$$9, TRUE, TRUE, TRUE]
>   -- BTREE_SEARCH  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   intersect ([$$9] <- [[$$9], [$$11]])
>   -- INTERSECT  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   order (ASC, %0->$$9) 
>   -- STABLE_SORT [$$9(ASC)]  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$9] <- 

[jira] [Commented] (ASTERIXDB-1700) edit-distance-check on the fields with the 2-gram and the 3-gram index generates a null pointer exception.

2016-10-20 Thread Taewoo Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592566#comment-15592566
 ] 

Taewoo Kim commented on ASTERIXDB-1700:
---

In summary, there are two issues:
 1) the same type of indexes on the same predicate (field) are used. This needs 
to be avoided.
 2) Replicate Operator is introduced by ExtractCommonOperatorsRule. 
ExtractCommonOperatorsRule doesn't transform the plan correctly since the 
runtime complains about the input connector. 

For 1), we will add a quick fix that only one of index will be used. For 2), I 
think it's a rare case and Jianfeng also mentioned that we need to check the 
relationship between Intersect operator and Replicate operator. So, for now, 
let's focus on 1).

> edit-distance-check on the fields with the 2-gram and the 3-gram index 
> generates a null pointer exception.
> --
>
> Key: ASTERIXDB-1700
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1700
> Project: Apache AsterixDB
>  Issue Type: Bug
>Reporter: Taewoo Kim
>Assignee: Taewoo Kim
>
> If there multiple indexes on the same field, we use intersect operator to 
> integrate the result from each index. In the following AQL query, we have two 
> n-gram indexes on the same field. And, the null pointer exception happens.
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hyracks.api.job.JobSpecification.getInputConnectorDescriptor(JobSpecification.java:192)
>   at 
> org.apache.hyracks.api.job.JobSpecification.getInputConnectorDescriptor(JobSpecification.java:188)
>   at 
> org.apache.hyracks.api.client.impl.JobActivityGraphBuilder.addSourceEdge(JobActivityGraphBuilder.java:81)
>   at 
> org.apache.hyracks.dataflow.std.base.AbstractSingleActivityOperatorDescriptor.contributeActivities(AbstractSingleActivityOperatorDescriptor.java:54)
>   at 
> org.apache.hyracks.api.client.impl.JobSpecificationActivityClusterGraphGeneratorFactory$2.visit(JobSpecificationActivityClusterGraphGeneratorFactory.java:67)
>   at 
> org.apache.hyracks.api.client.impl.PlanUtils.visitOperator(PlanUtils.java:41)
>   at org.apache.hyracks.api.client.impl.PlanUtils.visit(PlanUtils.java:34)
>   at 
> org.apache.hyracks.api.client.impl.JobSpecificationActivityClusterGraphGeneratorFactory.createActivityClusterGraphGenerator(JobSpecificationActivityClusterGraphGeneratorFactory.java:64)
>   at 
> org.apache.hyracks.control.cc.work.JobStartWork.doRun(JobStartWork.java:61)
>   at 
> org.apache.hyracks.control.common.work.SynchronizableWork.run(SynchronizableWork.java:39)
>   at 
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> Oct 19, 2016 7:10:22 PM org.apache.asterix.app.translator.QueryTranslator 
> handleQuery
> {code}
> {code}
> create type DBLPType as closed {
>   id: int64,
>   dblpid: string,
>   title: string,
>   authors: string,
>   misc: string
> }
> create dataset DBLP(DBLPType)
>   primary key id on group1;
> create index ngram2_index on DBLP(authors) type ngram(2);
> create index ngram3_index on DBLP(authors) type ngram(3);
> for $o in dataset('DBLP')
> let $ed := edit-distance-check($o.authors, "Amihay Motro", 1)
> where $ed[0]
> return $o
> {code}
> {code}
> -- DISTRIBUTE_RESULT  |PARTITIONED|
>   exchange
>   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> select (function-call: asterix:get-item, Args:[function-call: 
> asterix:edit-distance-check, Args:[function-call: 
> asterix:field-access-by-index, Args:[%0->$$0, AInt32: {3}], AString: {Amihay 
> Motro}, AInt64: {1}], AInt64: {0}])
> -- STREAM_SELECT  |PARTITIONED|
>   project ([$$0])
>   -- STREAM_PROJECT  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$6, $$0] <- function-call: asterix:index-search, 
> Args:[AString: {DBLP}, AInt32: {0}, AString: {test}, AString: {DBLP}, 
> ABoolean: {false}, ABoolean: {false}, AInt32: {1}, %0->$$9, AInt32: {1}, 
> %0->$$9, TRUE, TRUE, TRUE]
>   -- BTREE_SEARCH  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   intersect ([$$9] <- [[$$9], [$$11]])
>   -- INTERSECT  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   order (ASC, %0->$$9) 
>   -- STABLE_SORT [$$9(ASC)]  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$9] <- function-call: 
> asterix:index-search, Args:[AString: {ngram2_index}, AInt32: {5}, AString: 
> {test}, AString: {DBLP}, ABoolean: {false}, ABoolean: {false}, AInt32: {2}, 
> AInt64: {1}, AInt32: {12}, AInt32: {1}, %0->$$8]
>

[jira] [Commented] (ASTERIXDB-1700) edit-distance-check on the fields with the 2-gram and the 3-gram index generates a null pointer exception.

2016-10-20 Thread Taewoo Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592330#comment-15592330
 ] 

Taewoo Kim commented on ASTERIXDB-1700:
---

A 2-gram index on the search predicate field can be used for 
edit-distance-check. Also, it applies to the same to the 3-gram index on the 
search predicate field. So, these two indexes will be intersected. 

> edit-distance-check on the fields with the 2-gram and the 3-gram index 
> generates a null pointer exception.
> --
>
> Key: ASTERIXDB-1700
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1700
> Project: Apache AsterixDB
>  Issue Type: Bug
>Reporter: Taewoo Kim
>Assignee: Jianfeng Jia
>
> If there multiple indexes on the same field, we use intersect operator to 
> integrate the result from each index. In the following AQL query, we have two 
> n-gram indexes on the same field. And, the null pointer exception happens.
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hyracks.api.job.JobSpecification.getInputConnectorDescriptor(JobSpecification.java:192)
>   at 
> org.apache.hyracks.api.job.JobSpecification.getInputConnectorDescriptor(JobSpecification.java:188)
>   at 
> org.apache.hyracks.api.client.impl.JobActivityGraphBuilder.addSourceEdge(JobActivityGraphBuilder.java:81)
>   at 
> org.apache.hyracks.dataflow.std.base.AbstractSingleActivityOperatorDescriptor.contributeActivities(AbstractSingleActivityOperatorDescriptor.java:54)
>   at 
> org.apache.hyracks.api.client.impl.JobSpecificationActivityClusterGraphGeneratorFactory$2.visit(JobSpecificationActivityClusterGraphGeneratorFactory.java:67)
>   at 
> org.apache.hyracks.api.client.impl.PlanUtils.visitOperator(PlanUtils.java:41)
>   at org.apache.hyracks.api.client.impl.PlanUtils.visit(PlanUtils.java:34)
>   at 
> org.apache.hyracks.api.client.impl.JobSpecificationActivityClusterGraphGeneratorFactory.createActivityClusterGraphGenerator(JobSpecificationActivityClusterGraphGeneratorFactory.java:64)
>   at 
> org.apache.hyracks.control.cc.work.JobStartWork.doRun(JobStartWork.java:61)
>   at 
> org.apache.hyracks.control.common.work.SynchronizableWork.run(SynchronizableWork.java:39)
>   at 
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> Oct 19, 2016 7:10:22 PM org.apache.asterix.app.translator.QueryTranslator 
> handleQuery
> {code}
> {code}
> create type DBLPType as closed {
>   id: int64,
>   dblpid: string,
>   title: string,
>   authors: string,
>   misc: string
> }
> create dataset DBLP(DBLPType)
>   primary key id on group1;
> create index ngram2_index on DBLP(authors) type ngram(2);
> create index ngram3_index on DBLP(authors) type ngram(3);
> for $o in dataset('DBLP')
> let $ed := edit-distance-check($o.authors, "Amihay Motro", 1)
> where $ed[0]
> return $o
> {code}
> {code}
> -- DISTRIBUTE_RESULT  |PARTITIONED|
>   exchange
>   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> select (function-call: asterix:get-item, Args:[function-call: 
> asterix:edit-distance-check, Args:[function-call: 
> asterix:field-access-by-index, Args:[%0->$$0, AInt32: {3}], AString: {Amihay 
> Motro}, AInt64: {1}], AInt64: {0}])
> -- STREAM_SELECT  |PARTITIONED|
>   project ([$$0])
>   -- STREAM_PROJECT  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$6, $$0] <- function-call: asterix:index-search, 
> Args:[AString: {DBLP}, AInt32: {0}, AString: {test}, AString: {DBLP}, 
> ABoolean: {false}, ABoolean: {false}, AInt32: {1}, %0->$$9, AInt32: {1}, 
> %0->$$9, TRUE, TRUE, TRUE]
>   -- BTREE_SEARCH  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   intersect ([$$9] <- [[$$9], [$$11]])
>   -- INTERSECT  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   order (ASC, %0->$$9) 
>   -- STABLE_SORT [$$9(ASC)]  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$9] <- function-call: 
> asterix:index-search, Args:[AString: {ngram2_index}, AInt32: {5}, AString: 
> {test}, AString: {DBLP}, ABoolean: {false}, ABoolean: {false}, AInt32: {2}, 
> AInt64: {1}, AInt32: {12}, AInt32: {1}, %0->$$8]
>   -- LENGTH_PARTITIONED_INVERTED_INDEX_SEARCH  
> |PARTITIONED|
> project ([$$8])
> -- STREAM_PROJECT  |PARTITIONED|
>   assign [$$8] <- [%0->$$10]
>   -- ASSIGN  |PARTITIONED|
> exchange
> -- 

[jira] [Commented] (ASTERIXDB-1700) edit-distance-check on the fields with the 2-gram and the 3-gram index generates a null pointer exception.

2016-10-20 Thread Taewoo Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590865#comment-15590865
 ] 

Taewoo Kim commented on ASTERIXDB-1700:
---

Yes. It runs well. It took me some time to reproduce this issue. 

> edit-distance-check on the fields with the 2-gram and the 3-gram index 
> generates a null pointer exception.
> --
>
> Key: ASTERIXDB-1700
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1700
> Project: Apache AsterixDB
>  Issue Type: Bug
>Reporter: Taewoo Kim
>Assignee: Jianfeng Jia
>
> If there multiple indexes on the same field, we use intersect operator to 
> integrate the result from each index. In the following AQL query, we have two 
> n-gram indexes on the same field. And, the null pointer exception happens.
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hyracks.api.job.JobSpecification.getInputConnectorDescriptor(JobSpecification.java:192)
>   at 
> org.apache.hyracks.api.job.JobSpecification.getInputConnectorDescriptor(JobSpecification.java:188)
>   at 
> org.apache.hyracks.api.client.impl.JobActivityGraphBuilder.addSourceEdge(JobActivityGraphBuilder.java:81)
>   at 
> org.apache.hyracks.dataflow.std.base.AbstractSingleActivityOperatorDescriptor.contributeActivities(AbstractSingleActivityOperatorDescriptor.java:54)
>   at 
> org.apache.hyracks.api.client.impl.JobSpecificationActivityClusterGraphGeneratorFactory$2.visit(JobSpecificationActivityClusterGraphGeneratorFactory.java:67)
>   at 
> org.apache.hyracks.api.client.impl.PlanUtils.visitOperator(PlanUtils.java:41)
>   at org.apache.hyracks.api.client.impl.PlanUtils.visit(PlanUtils.java:34)
>   at 
> org.apache.hyracks.api.client.impl.JobSpecificationActivityClusterGraphGeneratorFactory.createActivityClusterGraphGenerator(JobSpecificationActivityClusterGraphGeneratorFactory.java:64)
>   at 
> org.apache.hyracks.control.cc.work.JobStartWork.doRun(JobStartWork.java:61)
>   at 
> org.apache.hyracks.control.common.work.SynchronizableWork.run(SynchronizableWork.java:39)
>   at 
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> Oct 19, 2016 7:10:22 PM org.apache.asterix.app.translator.QueryTranslator 
> handleQuery
> {code}
> {code}
> create type DBLPType as closed {
>   id: int64,
>   dblpid: string,
>   title: string,
>   authors: string,
>   misc: string
> }
> create dataset DBLP(DBLPType)
>   primary key id on group1;
> create index ngram2_index on DBLP(authors) type ngram(2);
> create index ngram3_index on DBLP(authors) type ngram(3);
> for $o in dataset('DBLP')
> let $ed := edit-distance-check($o.authors, "Amihay Motro", 1)
> where $ed[0]
> return $o
> {code}
> {code}
> -- DISTRIBUTE_RESULT  |PARTITIONED|
>   exchange
>   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> select (function-call: asterix:get-item, Args:[function-call: 
> asterix:edit-distance-check, Args:[function-call: 
> asterix:field-access-by-index, Args:[%0->$$0, AInt32: {3}], AString: {Amihay 
> Motro}, AInt64: {1}], AInt64: {0}])
> -- STREAM_SELECT  |PARTITIONED|
>   project ([$$0])
>   -- STREAM_PROJECT  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$6, $$0] <- function-call: asterix:index-search, 
> Args:[AString: {DBLP}, AInt32: {0}, AString: {test}, AString: {DBLP}, 
> ABoolean: {false}, ABoolean: {false}, AInt32: {1}, %0->$$9, AInt32: {1}, 
> %0->$$9, TRUE, TRUE, TRUE]
>   -- BTREE_SEARCH  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   intersect ([$$9] <- [[$$9], [$$11]])
>   -- INTERSECT  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   order (ASC, %0->$$9) 
>   -- STABLE_SORT [$$9(ASC)]  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$9] <- function-call: 
> asterix:index-search, Args:[AString: {ngram2_index}, AInt32: {5}, AString: 
> {test}, AString: {DBLP}, ABoolean: {false}, ABoolean: {false}, AInt32: {2}, 
> AInt64: {1}, AInt32: {12}, AInt32: {1}, %0->$$8]
>   -- LENGTH_PARTITIONED_INVERTED_INDEX_SEARCH  
> |PARTITIONED|
> project ([$$8])
> -- STREAM_PROJECT  |PARTITIONED|
>   assign [$$8] <- [%0->$$10]
>   -- ASSIGN  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   replicate
>   -- REPLICATE  |PARTITIONED|
>

[jira] [Commented] (ASTERIXDB-1700) edit-distance-check on the fields with the 2-gram and the 3-gram index generates a null pointer exception.

2016-10-19 Thread Chen Li (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590893#comment-15590893
 ] 

Chen Li commented on ASTERIXDB-1700:


Why does the plan use both indexes?  One of them is sufficient.  Can you look 
into the way this plan is generated?

> edit-distance-check on the fields with the 2-gram and the 3-gram index 
> generates a null pointer exception.
> --
>
> Key: ASTERIXDB-1700
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1700
> Project: Apache AsterixDB
>  Issue Type: Bug
>Reporter: Taewoo Kim
>Assignee: Jianfeng Jia
>
> If there multiple indexes on the same field, we use intersect operator to 
> integrate the result from each index. In the following AQL query, we have two 
> n-gram indexes on the same field. And, the null pointer exception happens.
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hyracks.api.job.JobSpecification.getInputConnectorDescriptor(JobSpecification.java:192)
>   at 
> org.apache.hyracks.api.job.JobSpecification.getInputConnectorDescriptor(JobSpecification.java:188)
>   at 
> org.apache.hyracks.api.client.impl.JobActivityGraphBuilder.addSourceEdge(JobActivityGraphBuilder.java:81)
>   at 
> org.apache.hyracks.dataflow.std.base.AbstractSingleActivityOperatorDescriptor.contributeActivities(AbstractSingleActivityOperatorDescriptor.java:54)
>   at 
> org.apache.hyracks.api.client.impl.JobSpecificationActivityClusterGraphGeneratorFactory$2.visit(JobSpecificationActivityClusterGraphGeneratorFactory.java:67)
>   at 
> org.apache.hyracks.api.client.impl.PlanUtils.visitOperator(PlanUtils.java:41)
>   at org.apache.hyracks.api.client.impl.PlanUtils.visit(PlanUtils.java:34)
>   at 
> org.apache.hyracks.api.client.impl.JobSpecificationActivityClusterGraphGeneratorFactory.createActivityClusterGraphGenerator(JobSpecificationActivityClusterGraphGeneratorFactory.java:64)
>   at 
> org.apache.hyracks.control.cc.work.JobStartWork.doRun(JobStartWork.java:61)
>   at 
> org.apache.hyracks.control.common.work.SynchronizableWork.run(SynchronizableWork.java:39)
>   at 
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> Oct 19, 2016 7:10:22 PM org.apache.asterix.app.translator.QueryTranslator 
> handleQuery
> {code}
> {code}
> create type DBLPType as closed {
>   id: int64,
>   dblpid: string,
>   title: string,
>   authors: string,
>   misc: string
> }
> create dataset DBLP(DBLPType)
>   primary key id on group1;
> create index ngram2_index on DBLP(authors) type ngram(2);
> create index ngram3_index on DBLP(authors) type ngram(3);
> for $o in dataset('DBLP')
> let $ed := edit-distance-check($o.authors, "Amihay Motro", 1)
> where $ed[0]
> return $o
> {code}
> {code}
> -- DISTRIBUTE_RESULT  |PARTITIONED|
>   exchange
>   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> select (function-call: asterix:get-item, Args:[function-call: 
> asterix:edit-distance-check, Args:[function-call: 
> asterix:field-access-by-index, Args:[%0->$$0, AInt32: {3}], AString: {Amihay 
> Motro}, AInt64: {1}], AInt64: {0}])
> -- STREAM_SELECT  |PARTITIONED|
>   project ([$$0])
>   -- STREAM_PROJECT  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$6, $$0] <- function-call: asterix:index-search, 
> Args:[AString: {DBLP}, AInt32: {0}, AString: {test}, AString: {DBLP}, 
> ABoolean: {false}, ABoolean: {false}, AInt32: {1}, %0->$$9, AInt32: {1}, 
> %0->$$9, TRUE, TRUE, TRUE]
>   -- BTREE_SEARCH  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   intersect ([$$9] <- [[$$9], [$$11]])
>   -- INTERSECT  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   order (ASC, %0->$$9) 
>   -- STABLE_SORT [$$9(ASC)]  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$9] <- function-call: 
> asterix:index-search, Args:[AString: {ngram2_index}, AInt32: {5}, AString: 
> {test}, AString: {DBLP}, ABoolean: {false}, ABoolean: {false}, AInt32: {2}, 
> AInt64: {1}, AInt32: {12}, AInt32: {1}, %0->$$8]
>   -- LENGTH_PARTITIONED_INVERTED_INDEX_SEARCH  
> |PARTITIONED|
> project ([$$8])
> -- STREAM_PROJECT  |PARTITIONED|
>   assign [$$8] <- [%0->$$10]
>   -- ASSIGN  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   replicate
>

[jira] [Commented] (ASTERIXDB-1700) edit-distance-check on the fields with the 2-gram and the 3-gram index generates a null pointer exception.

2016-10-19 Thread Taewoo Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590529#comment-15590529
 ] 

Taewoo Kim commented on ASTERIXDB-1700:
---

Another issue: I think conducting the 2-gram index search and the 3-gram index 
search at the same time is not necessary. Only one index-search might be enough 
for this case. 

> edit-distance-check on the fields with the 2-gram and the 3-gram index 
> generates a null pointer exception.
> --
>
> Key: ASTERIXDB-1700
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1700
> Project: Apache AsterixDB
>  Issue Type: Bug
>Reporter: Taewoo Kim
>Assignee: Jianfeng Jia
>
> If there multiple indexes on the same field, we use intersect operator to 
> integrate the result from each index. In the following AQL query, we have two 
> n-gram indexes on the same field. And, the null pointer exception happens.
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hyracks.api.job.JobSpecification.getInputConnectorDescriptor(JobSpecification.java:192)
>   at 
> org.apache.hyracks.api.job.JobSpecification.getInputConnectorDescriptor(JobSpecification.java:188)
>   at 
> org.apache.hyracks.api.client.impl.JobActivityGraphBuilder.addSourceEdge(JobActivityGraphBuilder.java:81)
>   at 
> org.apache.hyracks.dataflow.std.base.AbstractSingleActivityOperatorDescriptor.contributeActivities(AbstractSingleActivityOperatorDescriptor.java:54)
>   at 
> org.apache.hyracks.api.client.impl.JobSpecificationActivityClusterGraphGeneratorFactory$2.visit(JobSpecificationActivityClusterGraphGeneratorFactory.java:67)
>   at 
> org.apache.hyracks.api.client.impl.PlanUtils.visitOperator(PlanUtils.java:41)
>   at org.apache.hyracks.api.client.impl.PlanUtils.visit(PlanUtils.java:34)
>   at 
> org.apache.hyracks.api.client.impl.JobSpecificationActivityClusterGraphGeneratorFactory.createActivityClusterGraphGenerator(JobSpecificationActivityClusterGraphGeneratorFactory.java:64)
>   at 
> org.apache.hyracks.control.cc.work.JobStartWork.doRun(JobStartWork.java:61)
>   at 
> org.apache.hyracks.control.common.work.SynchronizableWork.run(SynchronizableWork.java:39)
>   at 
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> Oct 19, 2016 7:10:22 PM org.apache.asterix.app.translator.QueryTranslator 
> handleQuery
> {code}
> {code}
> create type DBLPType as closed {
>   id: int64,
>   dblpid: string,
>   title: string,
>   authors: string,
>   misc: string
> }
> create dataset DBLP(DBLPType)
>   primary key id on group1;
> create index ngram2_index on DBLP(authors) type ngram(2);
> create index ngram3_index on DBLP(authors) type ngram(3);
> for $o in dataset('DBLP')
> let $ed := edit-distance-check($o.authors, "Amihay Motro", 1)
> where $ed[0]
> return $o
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ASTERIXDB-1700) edit-distance-check on the fields with the 2-gram and the 3-gram index generates a null pointer exception.

2016-10-19 Thread Jianfeng Jia (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590756#comment-15590756
 ] 

Jianfeng Jia commented on ASTERIXDB-1700:
-

did you try to use 2-gram only and the 3-gram only to see if it can run 
successfully? 
If both can run successfully then there must be something wrong with the set 
the input of the both access path in `IntroduceSelectAccessMethodRule`.

> edit-distance-check on the fields with the 2-gram and the 3-gram index 
> generates a null pointer exception.
> --
>
> Key: ASTERIXDB-1700
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1700
> Project: Apache AsterixDB
>  Issue Type: Bug
>Reporter: Taewoo Kim
>Assignee: Jianfeng Jia
>
> If there multiple indexes on the same field, we use intersect operator to 
> integrate the result from each index. In the following AQL query, we have two 
> n-gram indexes on the same field. And, the null pointer exception happens.
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hyracks.api.job.JobSpecification.getInputConnectorDescriptor(JobSpecification.java:192)
>   at 
> org.apache.hyracks.api.job.JobSpecification.getInputConnectorDescriptor(JobSpecification.java:188)
>   at 
> org.apache.hyracks.api.client.impl.JobActivityGraphBuilder.addSourceEdge(JobActivityGraphBuilder.java:81)
>   at 
> org.apache.hyracks.dataflow.std.base.AbstractSingleActivityOperatorDescriptor.contributeActivities(AbstractSingleActivityOperatorDescriptor.java:54)
>   at 
> org.apache.hyracks.api.client.impl.JobSpecificationActivityClusterGraphGeneratorFactory$2.visit(JobSpecificationActivityClusterGraphGeneratorFactory.java:67)
>   at 
> org.apache.hyracks.api.client.impl.PlanUtils.visitOperator(PlanUtils.java:41)
>   at org.apache.hyracks.api.client.impl.PlanUtils.visit(PlanUtils.java:34)
>   at 
> org.apache.hyracks.api.client.impl.JobSpecificationActivityClusterGraphGeneratorFactory.createActivityClusterGraphGenerator(JobSpecificationActivityClusterGraphGeneratorFactory.java:64)
>   at 
> org.apache.hyracks.control.cc.work.JobStartWork.doRun(JobStartWork.java:61)
>   at 
> org.apache.hyracks.control.common.work.SynchronizableWork.run(SynchronizableWork.java:39)
>   at 
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> Oct 19, 2016 7:10:22 PM org.apache.asterix.app.translator.QueryTranslator 
> handleQuery
> {code}
> {code}
> create type DBLPType as closed {
>   id: int64,
>   dblpid: string,
>   title: string,
>   authors: string,
>   misc: string
> }
> create dataset DBLP(DBLPType)
>   primary key id on group1;
> create index ngram2_index on DBLP(authors) type ngram(2);
> create index ngram3_index on DBLP(authors) type ngram(3);
> for $o in dataset('DBLP')
> let $ed := edit-distance-check($o.authors, "Amihay Motro", 1)
> where $ed[0]
> return $o
> {code}
> {code}
> -- DISTRIBUTE_RESULT  |PARTITIONED|
>   exchange
>   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> select (function-call: asterix:get-item, Args:[function-call: 
> asterix:edit-distance-check, Args:[function-call: 
> asterix:field-access-by-index, Args:[%0->$$0, AInt32: {3}], AString: {Amihay 
> Motro}, AInt64: {1}], AInt64: {0}])
> -- STREAM_SELECT  |PARTITIONED|
>   project ([$$0])
>   -- STREAM_PROJECT  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$6, $$0] <- function-call: asterix:index-search, 
> Args:[AString: {DBLP}, AInt32: {0}, AString: {test}, AString: {DBLP}, 
> ABoolean: {false}, ABoolean: {false}, AInt32: {1}, %0->$$9, AInt32: {1}, 
> %0->$$9, TRUE, TRUE, TRUE]
>   -- BTREE_SEARCH  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   intersect ([$$9] <- [[$$9], [$$11]])
>   -- INTERSECT  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   order (ASC, %0->$$9) 
>   -- STABLE_SORT [$$9(ASC)]  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$9] <- function-call: 
> asterix:index-search, Args:[AString: {ngram2_index}, AInt32: {5}, AString: 
> {test}, AString: {DBLP}, ABoolean: {false}, ABoolean: {false}, AInt32: {2}, 
> AInt64: {1}, AInt32: {12}, AInt32: {1}, %0->$$8]
>   -- LENGTH_PARTITIONED_INVERTED_INDEX_SEARCH  
> |PARTITIONED|
> project ([$$8])
> -- STREAM_PROJECT  |PARTITIONED|
>   assign [$$8] <- [%0->$$10]
>   -- ASSIGN  |PARTITIONED|
> exchange
>