[jira] [Commented] (ASTERIXDB-1636) Feed cannot re-ingest after cluster restart

2016-10-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550947#comment-15550947
 ] 

ASF subversion and git services commented on ASTERIXDB-1636:


Commit ecba52e0b9eca4f59b9b7fc082b720d677b4d98d in asterixdb's branch 
refs/heads/master from [~imaxon]
[ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=ecba52e ]

Tests for ASTERIXDB-1636

This is a test for the scenario described in the Jira issue. The only
thing I have taken liberty with is changing the socket feed to a file
one. The test case fails when I revert AqlMetadataProvider to the
previous version, and should pass now with this parent.

Change-Id: Ic1521f1d53121b6768ac123e49e731932c85
Reviewed-on: https://asterix-gerrit.ics.uci.edu/1248
Sonar-Qube: Jenkins 
Reviewed-by: Taewoo Kim 
Tested-by: Jenkins 
Integration-Tests: Jenkins 


> Feed cannot re-ingest after cluster restart
> ---
>
> Key: ASTERIXDB-1636
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1636
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: Feeds, Storage
> Environment: master
> commit c89d668f68e5430a6ba4455daf8f9cd6f7040dd8
> Date:   Tue Sep 6 18:29:23 2016 -0700
>Reporter: Jianfeng Jia
>Assignee: Ian Maxon
>Priority: Blocker
>  Labels: soon
>
> Here are steps to reproduce the problem:
> 1. start the cluster
> 2. ingest the initial data using file feed 
> [script|https://gist.github.com/JavierJia/9ed7744c938c5cb66aba63007b86a987]
> 2.1: file for ingestion: 
> https://drive.google.com/open?id=0B423M7wGZj9dNE5HenFqcjhuUFk
> 3. start another socket feed 
> [script|https://gist.github.com/JavierJia/565cefd9322df35c7abeefbfcfcee9f8] 
> to ingest the live data  
> 4. restart the cluster
> 5. start that live socket feed again.
> 6. with your own twitter credential you can use [this 
> script|https://github.com/ISG-ICS/cloudberry/blob/master/streamFeed.sh]  to 
> ingest the tweet
> 7. It will send at most 280 tweets and stops forever.
> [~imaxon] [~idleft] if you can help that will be great.
> related to ASTERIXDB-1264



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ASTERIXDB-1636) Feed cannot re-ingest after cluster restart

2016-09-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534686#comment-15534686
 ] 

ASF subversion and git services commented on ASTERIXDB-1636:


Commit 2685b60d9e03a515fcc5260a3dd4399740f5ec40 in asterixdb's branch 
refs/heads/master from [~imaxon]
[ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=2685b60 ]

Fixes for ASTERIXDB-1636

The index of the tuple field for filters from SecondaryIndexOperationsHelper 
and AqlMetadataProvider
differed. The one in AqlMetadataProvider was wrong, as it was attempting to 
take into account the
presence of a partitioning field in the incoming tuple, which is not there in 
the case of an
insert/upsert.

There was also an issue where on merge, for components with a filter page but 
no min/max, the merge
would fail. I fixed this by skipping over null entries while getting the 
min/max of merging components.

Finally, there was a very silly error in LSMComponentFilterManager which was 
causing the filter page
to appear as blank, because the page was being pinned with the wrong argument. 
That is also fixed.

Change-Id: Ib4bc413fcda9a5c98ae57f94e1c8a68fe9aacda3
Reviewed-on: https://asterix-gerrit.ics.uci.edu/1205
Sonar-Qube: Jenkins 
Reviewed-by: Taewoo Kim 
Tested-by: Jenkins 
Integration-Tests: Jenkins 


> Feed cannot re-ingest after cluster restart
> ---
>
> Key: ASTERIXDB-1636
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1636
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: Feeds, Storage
> Environment: master
> commit c89d668f68e5430a6ba4455daf8f9cd6f7040dd8
> Date:   Tue Sep 6 18:29:23 2016 -0700
>Reporter: Jianfeng Jia
>Assignee: Ian Maxon
>Priority: Blocker
>  Labels: soon
>
> Here are steps to reproduce the problem:
> 1. start the cluster
> 2. ingest the initial data using file feed 
> [script|https://gist.github.com/JavierJia/9ed7744c938c5cb66aba63007b86a987]
> 2.1: file for ingestion: 
> https://drive.google.com/open?id=0B423M7wGZj9dNE5HenFqcjhuUFk
> 3. start another socket feed 
> [script|https://gist.github.com/JavierJia/565cefd9322df35c7abeefbfcfcee9f8] 
> to ingest the live data  
> 4. restart the cluster
> 5. start that live socket feed again.
> 6. with your own twitter credential you can use [this 
> script|https://github.com/ISG-ICS/cloudberry/blob/master/streamFeed.sh]  to 
> ingest the tweet
> 7. It will send at most 280 tweets and stops forever.
> [~imaxon] [~idleft] if you can help that will be great.
> related to ASTERIXDB-1264



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ASTERIXDB-1636) Feed cannot re-ingest after cluster restart

2016-09-21 Thread Ian Maxon (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512156#comment-15512156
 ] 

Ian Maxon commented on ASTERIXDB-1636:
--

I've found and possibly fixed one issue so far, that was directly causing this, 
which was related to the difference in the computation of the filter index 
position between AqlMetadataProvider and 
SecondaryInvertedIndexOperationsHelper. The latter says the position is the num 
PKs + num SK, which seems to be correct, the former will give an index out of 
bounds. The former is used after restart, the latter gets used when the index 
is created. This would explain why everything seems to work just fine until 
restart. 

There also seems to be an issue or two with how filters are stored for inverted 
indices in general however. One issue is that a "identity" on-disk inverted 
index (so 0 tuples) may have a filter page, and this will cause merges 
involving it to fail. The other issue seems to be that there appears to be a 
way in which tuples to the inverted index might bypass updating the filters 
entirely but I'm less sure of this issue, I still need to dig into it more. 

> Feed cannot re-ingest after cluster restart
> ---
>
> Key: ASTERIXDB-1636
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1636
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: Feeds, Storage
> Environment: master
> commit c89d668f68e5430a6ba4455daf8f9cd6f7040dd8
> Date:   Tue Sep 6 18:29:23 2016 -0700
>Reporter: Jianfeng Jia
>Assignee: Ian Maxon
>Priority: Blocker
>  Labels: soon
>
> Here are steps to reproduce the problem:
> 1. start the cluster
> 2. ingest the initial data using file feed 
> [script|https://gist.github.com/JavierJia/9ed7744c938c5cb66aba63007b86a987]
> 2.1: file for ingestion: 
> https://drive.google.com/open?id=0B423M7wGZj9dNE5HenFqcjhuUFk
> 3. start another socket feed 
> [script|https://gist.github.com/JavierJia/565cefd9322df35c7abeefbfcfcee9f8] 
> to ingest the live data  
> 4. restart the cluster
> 5. start that live socket feed again.
> 6. with your own twitter credential you can use [this 
> script|https://github.com/ISG-ICS/cloudberry/blob/master/streamFeed.sh]  to 
> ingest the tweet
> 7. It will send at most 280 tweets and stops forever.
> [~imaxon] [~idleft] if you can help that will be great.
> related to ASTERIXDB-1264



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ASTERIXDB-1636) Feed cannot re-ingest after cluster restart

2016-09-19 Thread Ian Maxon (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505625#comment-15505625
 ] 

Ian Maxon commented on ASTERIXDB-1636:
--

Somehow what's happening is that on the initial index creation the LSM index 
gets the correct filter variable position during intialization. Where it comes 
from initially isn't clear to me. It seems like the value computed by 
AQLMetadataProvider is always wrong, but somehow on the first run it is 
corrected. 

> Feed cannot re-ingest after cluster restart
> ---
>
> Key: ASTERIXDB-1636
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1636
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: Feeds, Storage
> Environment: master
> commit c89d668f68e5430a6ba4455daf8f9cd6f7040dd8
> Date:   Tue Sep 6 18:29:23 2016 -0700
>Reporter: Jianfeng Jia
>Assignee: Ian Maxon
>Priority: Blocker
>  Labels: soon
>
> Here are steps to reproduce the problem:
> 1. start the cluster
> 2. ingest the initial data using file feed 
> [script|https://gist.github.com/JavierJia/9ed7744c938c5cb66aba63007b86a987]
> 2.1: file for ingestion: 
> https://drive.google.com/open?id=0B423M7wGZj9dNE5HenFqcjhuUFk
> 3. start another socket feed 
> [script|https://gist.github.com/JavierJia/565cefd9322df35c7abeefbfcfcee9f8] 
> to ingest the live data  
> 4. restart the cluster
> 5. start that live socket feed again.
> 6. with your own twitter credential you can use [this 
> script|https://github.com/ISG-ICS/cloudberry/blob/master/streamFeed.sh]  to 
> ingest the tweet
> 7. It will send at most 280 tweets and stops forever.
> [~imaxon] [~idleft] if you can help that will be great.
> related to ASTERIXDB-1264



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ASTERIXDB-1636) Feed cannot re-ingest after cluster restart

2016-09-09 Thread Ian Maxon (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478436#comment-15478436
 ] 

Ian Maxon commented on ASTERIXDB-1636:
--

Jianfeng, Xikui and I were discussing this. Apparently everything is fine until 
one tries to insert after restart. Queries seem to work fine. Overall it seems 
more likely a storage issue than a feeds issue. 

> Feed cannot re-ingest after cluster restart
> ---
>
> Key: ASTERIXDB-1636
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1636
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: Feeds, Storage
> Environment: master
> commit c89d668f68e5430a6ba4455daf8f9cd6f7040dd8
> Date:   Tue Sep 6 18:29:23 2016 -0700
>Reporter: Jianfeng Jia
>Assignee: Abdullah Alamoudi
>Priority: Blocker
>  Labels: soon
>
> Here are steps to reproduce the problem:
> 1. start the cluster
> 2. ingest the initial data using file feed 
> [script|https://gist.github.com/JavierJia/9ed7744c938c5cb66aba63007b86a987]
> 2.1: file for ingestion: 
> https://drive.google.com/open?id=0B423M7wGZj9dNE5HenFqcjhuUFk
> 3. start another socket feed 
> [script|https://gist.github.com/JavierJia/565cefd9322df35c7abeefbfcfcee9f8] 
> to ingest the live data  
> 4. restart the cluster
> 5. start that live socket feed again.
> 6. with your own twitter credential you can use [this 
> script|https://github.com/ISG-ICS/cloudberry/blob/master/streamFeed.sh]  to 
> ingest the tweet
> 7. It will send at most 280 tweets and stops forever.
> [~imaxon] [~idleft] if you can help that will be great.
> related to ASTERIXDB-1264



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ASTERIXDB-1636) Feed cannot re-ingest after cluster restart

2016-09-09 Thread Xikui Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477625#comment-15477625
 ] 

Xikui Wang commented on ASTERIXDB-1636:
---

I got the log from Jianfeng. It seems the exception is raised after the record 
is parsed. Probably related to ASTERIXDB-1616

{quote}
org.apache.hyracks.api.exceptions.HyracksDataException: 3
at 
org.apache.asterix.common.dataflow.AsterixLSMInsertDeleteOperatorNodePushable.nextFrame(AsterixLSMInsertDeleteOperatorNodePushable.java:152)
at 
org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92)
at 
org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.flushAndReset(AbstractOneInputOneOutputOneFramePushRuntime.java:63)
at 
org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendProjectionToFrame(AbstractOneInputOneOutputOneFramePushRuntime.java:97)
at 
org.apache.hyracks.algebricks.runtime.operators.std.StreamProjectRuntimeFactory$1.nextFrame(StreamProjectRuntimeFactory.java:83)
at 
org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92)
at 
org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.flushAndReset(AbstractOneInputOneOutputOneFramePushRuntime.java:63)
at 
org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:85)
at 
org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.nextFrame(AssignRuntimeFactory.java:154)
at 
org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.nextFrame(AlgebricksMetaOperatorDescriptor.java:148)
at 
org.apache.hyracks.dataflow.common.comm.util.FrameUtils.flushFrame(FrameUtils.java:45)
at 
org.apache.asterix.common.dataflow.AsterixLSMInsertDeleteOperatorNodePushable.nextFrame(AsterixLSMInsertDeleteOperatorNodePushable.java:160)
at 
org.apache.asterix.external.feed.dataflow.SyncFeedRuntimeInputHandler.nextFrame(SyncFeedRuntimeInputHandler.java:46)
at 
org.apache.asterix.external.operators.FeedMetaStoreNodePushable.nextFrame(FeedMetaStoreNodePushable.java:145)
at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:349)
at org.apache.hyracks.control.nc.Task.run(Task.java:297)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
at 
org.apache.hyracks.storage.am.common.tuples.PermutingFrameTupleReference.getFieldLength(PermutingFrameTupleReference.java:56)
at 
org.apache.hyracks.storage.am.common.tuples.PermutingTupleReference.getFieldLength(PermutingTupleReference.java:54)
at 
org.apache.hyracks.storage.am.common.tuples.TypeAwareTupleWriter.bytesRequired(TypeAwareTupleWriter.java:43)
at 
org.apache.hyracks.storage.am.lsm.common.impls.LSMComponentFilter.update(LSMComponentFilter.java:68)
at 
org.apache.hyracks.storage.am.lsm.invertedindex.impls.LSMInvertedIndex.modify(LSMInvertedIndex.java:367)
at 
org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:376)
at 
org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.forceModify(LSMHarness.java:356)
at 
org.apache.hyracks.storage.am.lsm.invertedindex.impls.LSMInvertedIndexAccessor.forceInsert(LSMInvertedIndexAccessor.java:143)
at 
org.apache.asterix.common.dataflow.AsterixLSMInsertDeleteOperatorNodePushable.nextFrame(AsterixLSMInsertDeleteOperatorNodePushable.java:128)
... 18 more
{quote}

> Feed cannot re-ingest after cluster restart
> ---
>
> Key: ASTERIXDB-1636
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1636
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: Feeds, Storage
> Environment: master
> commit c89d668f68e5430a6ba4455daf8f9cd6f7040dd8
> Date:   Tue Sep 6 18:29:23 2016 -0700
>Reporter: Jianfeng Jia
>Assignee: Abdullah Alamoudi
>Priority: Blocker
>  Labels: soon
>
> Here are steps to reproduce the problem:
> 1. start the cluster
> 2. ingest the initial data using file feed 
> [script|https://gist.github.com/JavierJia/9ed7744c938c5cb66aba63007b86a987]
> 2.1: file for ingestion: 
> https://drive.google.com/open?id=0B423M7wGZj9dNE5HenFqcjhuUFk
> 3. start another socket feed 
> [script|https://gist.github.com/JavierJia/565cefd9322df35c7abeefbfcfcee9f8] 
> to ingest the live data  
> 4. restart the cluster
> 5. start that live socket feed again.
> 6. with your 

[jira] [Commented] (ASTERIXDB-1636) Feed cannot re-ingest after cluster restart

2016-09-08 Thread Ian Maxon (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15474686#comment-15474686
 ] 

Ian Maxon commented on ASTERIXDB-1636:
--

Sure, I can take a look to see if it reproduces for me.

> Feed cannot re-ingest after cluster restart
> ---
>
> Key: ASTERIXDB-1636
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1636
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: Feeds, Storage
> Environment: master
> commit c89d668f68e5430a6ba4455daf8f9cd6f7040dd8
> Date:   Tue Sep 6 18:29:23 2016 -0700
>Reporter: Jianfeng Jia
>Assignee: Abdullah Alamoudi
>Priority: Blocker
>  Labels: soon
>
> Here are steps to reproduce the problem:
> 1. start the cluster
> 2. ingest the initial data using file feed 
> [script|https://gist.github.com/JavierJia/9ed7744c938c5cb66aba63007b86a987]
> 3. start another socket feed 
> [script|https://gist.github.com/JavierJia/565cefd9322df35c7abeefbfcfcee9f8] 
> to ingest the live data  
> 4. restart the cluster
> 5. start that live socket feed again.
> 6. with your own twitter credential you can use [this 
> script|https://github.com/ISG-ICS/cloudberry/blob/master/streamFeed.sh]  to 
> ingest the tweet
> 7. It will send at most 280 tweets and stops forever.
> [~imaxon] [~idleft] if you can help that will be great.
> related to ASTERIXDB-1264



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)