[ https://issues.apache.org/jira/browse/ASTERIXDB-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15523194#comment-15523194 ]
Wenhai commented on ASTERIXDB-1412: ----------------------------------- Commit 58dfdc99aa7f0f292be5a59b922148e517c8ce3e in asterixdb's branch refs/heads/master from Jianfeng Jia [ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=58dfdc9 ] Fix ASTERIXDB-1566 fix UTF8 comparator and hash function. Change-Id: I187bf1243abf143b3b265fa8098614b9a72c65ad Reviewed-on: https://asterix-gerrit.ics.uci.edu/1054 Sonar-Qube: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Tested-by: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Reviewed-by: Yingyi Bu <buyin...@gmail.com> Integration-Tests: Jenkins <jenk...@fulliautomatix.ics.uci.edu> > Batch import errors > ------------------- > > Key: ASTERIXDB-1412 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-1412 > Project: Apache AsterixDB > Issue Type: Bug > Components: Data Formats, Hyracks > Environment: Linux ubuntu12.04, 24cores, 128GB memory; > Configure: 2NCs X 12partitions, 10GB/per NCs > Reporter: Wenhai > Assignee: Abdullah Alamoudi > > When we importing about 2 million ACM datasets with the following schema, we > got a data fields mismatching error. While we splitting the records into 6 > parts, the error disappears. Always, we put the dataset onto the following > site for verifying. > Dataset > {noformat} > http://pan.baidu.com/s/1c26Y9LQ > {noformat} > Schema > {noformat} > drop dataverse fuzzytest if exists; > create dataverse fuzzytest; > use dataverse fuzzytest; > create type PaperType as open { > tid:uuid, > title: string, > authors: string?, > year: int?, > conf: string?, > idx: string, > abstract: string? > } > {noformat} > Splitting import successfully > {noformat} > use dataverse test; > create dataset ACM(PaperType) primary key tid autogenerated; > load dataset ACM > using localfs > (("path"="127.0.0.1:///home/hadoop/Downloads/doccorpus/acm_split.aa,127.0.0.1:///home/hadoop/Downloads/doccorpus/acm_split.ab,127.0.0.1:///home/hadoop/Downloads/doccorpus/acm_split.ac,127.0.0.1:///home/hadoop/Downloads/doccorpus/acm_split.ad,127.0.0.1:///home/hadoop/Downloads/doccorpus/acm_split.ae"),("format"="delimited-text"),("delimiter"="#"),("quote"="\u0000")); > {noformat} > Batch Import with failure > {noformat} > use dataverse test; > drop dataset ACM if exists; > create dataset ACM(PaperType) primary key tid autogenerated; > load dataset ACM > using localfs > (("path"="127.0.0.1:///home/hadoop/Downloads/doccorpus/reproduce/acm_raw.txt"),("format"="delimited-text"),("delimiter"="#"),("quote"="\u0000")); > {noformat} > Error message > {noformat} > Long Parser - a digit is expected. But, encountered this character: V in the > incoming input: VLSID 05 Proceedings of the 18th International Conference on > VLSI Design held jointly with 4th International Conference on Embedded > Systems Design [HyracksDataException] > {noformat} > Error trace > {noformat} > org.apache.hyracks.api.exceptions.HyracksDataException: > java.util.concurrent.ExecutionException: > org.apache.hyracks.api.exceptions.HyracksDataException: > org.apache.hyracks.api.exceptions.HyracksDataException: > org.apache.hyracks.api.exceptions.HyracksDataException: Long Parser - a digit > is expected. But, encountered this character: V in the incoming input: VLSID > 05 Proceedings of the 18th International Conference on VLSI Design held > jointly with 4th International Conference on Embedded Systems Design > at > org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:218) > at > org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.initialize(SuperActivityOperatorNodePushable.java:83) > at org.apache.hyracks.control.nc.Task.run(Task.java:263) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.ExecutionException: > org.apache.hyracks.api.exceptions.HyracksDataException: > org.apache.hyracks.api.exceptions.HyracksDataException: > org.apache.hyracks.api.exceptions.HyracksDataException: Long Parser - a digit > is expected. But, encountered this character: V in the incoming input: VLSID > 05 Proceedings of the 18th International Conference on VLSI Design held > jointly with 4th International Conference on Embedded Systems Design > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:212) > ... 5 more > Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: > org.apache.hyracks.api.exceptions.HyracksDataException: > org.apache.hyracks.api.exceptions.HyracksDataException: Long Parser - a digit > is expected. But, encountered this character: V in the incoming input: VLSID > 05 Proceedings of the 18th International Conference on VLSI Design held > jointly with 4th International Conference on Embedded Systems Design > at > org.apache.asterix.external.operators.ExternalDataScanOperatorDescriptor$1.initialize(ExternalDataScanOperatorDescriptor.java:65) > at > org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$initialize$0(SuperActivityOperatorNodePushable.java:83) > at > org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$1.call(SuperActivityOperatorNodePushable.java:205) > at > org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$1.call(SuperActivityOperatorNodePushable.java:202) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more > Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: > org.apache.hyracks.api.exceptions.HyracksDataException: Long Parser - a digit > is expected. But, encountered this character: V in the incoming input: VLSID > 05 Proceedings of the 18th International Conference on VLSI Design held > jointly with 4th International Conference on Embedded Systems Design > at > org.apache.asterix.external.dataflow.RecordDataFlowController.start(RecordDataFlowController.java:65) > at > org.apache.asterix.external.dataset.adapter.GenericAdapter.start(GenericAdapter.java:37) > at > org.apache.asterix.external.operators.ExternalDataScanOperatorDescriptor$1.initialize(ExternalDataScanOperatorDescriptor.java:62) > ... 7 more > Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: Long > Parser - a digit is expected. But, encountered this character: V in the > incoming input: VLSID 05 Proceedings of the 18th International Conference on > VLSI Design held jointly with 4th International Conference on Embedded > Systems Design > at > org.apache.hyracks.dataflow.common.data.parsers.LongParserFactory$1.parse(LongParserFactory.java:74) > at > org.apache.asterix.external.parser.DelimitedDataParser.parseRecord(DelimitedDataParser.java:144) > at > org.apache.asterix.external.parser.DelimitedDataParser.parse(DelimitedDataParser.java:159) > at > org.apache.asterix.external.dataflow.RecordDataFlowController.start(RecordDataFlowController.java:57) > ... 9 more > Apr 24, 2016 11:38:19 AM > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run > INFO: Executing: NotifyTaskFailure > Apr 24, 2016 11:38:19 AM > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run > INFO: Executing: AbortTasks > Apr 24, 2016 11:38:19 AM org.apache.hyracks.control.nc.work.AbortTasksWork run > INFO: Aborting Tasks: JID:19:[TAID:TID:ANID:ODID:1:0:12:0, > TAID:TID:ANID:ODID:1:0:13:0, TAID:TID:ANID:ODID:1:0:14:0, > TAID:TID:ANID:ODID:1:0:15:0, TAID:TID:ANID:ODID:1:0:16:0, > TAID:TID:ANID:ODID:1:0:17:0, TAID:TID:ANID:ODID:1:0:18:0, > TAID:TID:ANID:ODID:1:0:19:0, TAID:TID:ANID:ODID:1:0:20:0, > TAID:TID:ANID:ODID:1:0:21:0, TAID:TID:ANID:ODID:1:0:22:0, > TAID:TID:ANID:ODID:1:0:23:0] > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:467) > at org.apache.hyracks.control.nc.Task.run(Task.java:299) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:467) > at org.apache.hyracks.control.nc.Task.run(Task.java:299) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:467) > at org.apache.hyracks.control.nc.Task.run(Task.java:299) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:467) > at org.apache.hyracks.control.nc.Task.run(Task.java:299) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:467) > at org.apache.hyracks.control.nc.Task.run(Task.java:299) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:467) > at org.apache.hyracks.control.nc.Task.run(Task.java:299) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:467) > at org.apache.hyracks.control.nc.Task.run(Task.java:299) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:467) > at org.apache.hyracks.control.nc.Task.run(Task.java:299) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:467) > at org.apache.hyracks.control.nc.Task.run(Task.java:299) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:467) > at org.apache.hyracks.control.nc.Task.run(Task.java:299) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:467) > at org.apache.hyracks.control.nc.Task.run(Task.java:299) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:467) > at org.apache.hyracks.control.nc.Task.run(Task.java:299) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Apr 24, 2016 11:38:19 AM > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run > INFO: Executing: CleanupJoblet > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)