Hello and thank you if you have some input!
I have to import JSON files that were exported from a MongoDB database using
mongoexport. The files have record structures similar to this:
[{
"_id": {
"$oid": "5da2fe0542eba4003e31665d"
},
"id": "3BD000001999FA9A8FA6D35C",
"idFormat": "SER96",
"type": "ITEM",
"siteName": "350",
"regionName": null,
"site": "Xd4f87b6f-5199-43ac-b231-fbe6e3a8039c",
"state": "VALID",
"generated": null,
"stateChangeReason": "NONE",
"floor": 0,
"confidence": 90,
"region": "R9b3e5236-1b77-397a-bc0f-e97cb072ba37",
"timestamp": {
"$date": "2019-10-13T06:59:58Z"
},
"x": 735,
"y": 731,
"z": 36,
"productId": "107380390",
"events": [
"POSITION_CHANGE",
"REGION_CHANGE"
],
"product": null
}]
Apache Drill will read this file if the "$date" is replaced with "date", but
produces this error when the $ is present:
(java.time.format.DateTimeParseException) Text '2019-10-13T06:59:58Z' could
not be parsed at index 19
The export file is large (~ 1 GB) and I can not process it to remove the dollar
signs. Any help in getting the drill driver to parse the date or even just
treat it as text would be greatly appreciated!
Here is the associated log snippet:
2019-10-23 19:57:18,724 [224f4f67-9ff7-48d4-bc59-93591b01aa2e:frag:0:0] INFO
o.a.d.e.w.f.FragmentStatusReporter - 224f4f67-9ff7-48d4-bc59-93591b01aa2e:0:0:
State to report: RUNNING
2019-10-23 19:57:18,760 [224f4f67-9ff7-48d4-bc59-93591b01aa2e:frag:0:0] INFO
o.a.d.exec.physical.impl.ScanBatch - User Error Occurred: Text
'2019-10-13T06:59:58Z' could not be parsed at index 19 (Text
'2019-10-13T06:59:58Z' could not be parsed at index 19)
org.apache.drill.common.exceptions.UserException: INTERNAL_ERROR ERROR: Text
'2019-10-13T06:59:58Z' could not be parsed at index 19
Please, refer to logs for more information.
[Error Id: 7f039eed-4e4e-43fe-a1a3-d1021e61fde5 ]
at
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:630)
~[drill-common-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:293)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext(LimitRecordBatch.java:101)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:141)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:141)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:83)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:296)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:283)
[drill-java-exec-1.16.0.jar:1.16.0]
at java.security.AccessController.doPrivileged(Native Method)
[na:1.8.0_191]
at javax.security.auth.Subject.doAs(Subject.java:422)
[na:1.8.0_191]
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
[hadoop-common-2.7.4.jar:na]
at
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:283)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
[drill-common-1.16.0.jar:1.16.0]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[na:1.8.0_191]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[na:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191]
Caused by: java.time.format.DateTimeParseException: Text '2019-10-13T06:59:58Z'
could not be parsed at index 19
at
java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1949)
~[na:1.8.0_191]
at
java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1851)
~[na:1.8.0_191]
at java.time.OffsetDateTime.parse(OffsetDateTime.java:402)
~[na:1.8.0_191]
at
org.apache.drill.exec.vector.complex.fn.VectorOutput$MapVectorOutput.writeTimestamp(VectorOutput.java:353)
~[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.vector.complex.fn.VectorOutput.innerRun(VectorOutput.java:112)
~[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.vector.complex.fn.VectorOutput$MapVectorOutput.run(VectorOutput.java:301)
~[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.vector.complex.fn.JsonReader.writeMapDataIfTyped(JsonReader.java:485)
~[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.vector.complex.fn.JsonReader.writeData(JsonReader.java:366)
~[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataSwitch(JsonReader.java:297)
~[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.vector.complex.fn.JsonReader.writeToVector(JsonReader.java:258)
~[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.vector.complex.fn.JsonReader.write(JsonReader.java:204)
~[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.store.easy.json.JSONRecordReader.next(JSONRecordReader.java:218)
~[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.physical.impl.ScanBatch.internalNext(ScanBatch.java:223)
[drill-java-exec-1.16.0.jar:1.16.0]
at
org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:271)
[drill-java-exec-1.16.0.jar:1.16.0]
... 32 common frames omitted
2019-10-23 19:57:18,761 [224f4f67-9ff7-48d4-bc59-93591b01aa2e:frag:0:0] INFO
o.a.d.e.w.fragment.FragmentExecutor - 224f4f67-9ff7-48d4-bc59-93591b01aa2e:0:0:
State change requested RUNNING --> FAILED
________________________________
- CONFIDENTIAL-
This email and any files transmitted with it are confidential, and may also be
legally privileged. If you are not the intended recipient, you may not review,
use, copy, or distribute this message. If you receive this email in error,
please notify the sender immediately by reply email and then delete this email.