Re: when run progrm on big data customer 2.5GB orders 5GB disply error why

Robert Metzger Thu, 18 Jun 2015 14:47:38 -0700

Hi,

the problem is that Flink is trying to parse the input data as CSV, but
there seem to be rows in the data which do not conform to the specified
schema.



On Thu, Jun 18, 2015 at 12:51 PM, hagersaleh <[email protected]>
wrote:

> when run progrm on big data customer 2.5GB orders 5GB disply error why
>
>
> DataSource (at getCustomerDataSet(TPCHQuery3.java:252)
> (org.apache.flink.api.java.io.CsvInputFormat)) (1/1) switched to FAILED
> org.apache.flink.api.common.io.ParseException: Row too short:
> 14999999|Customer#014999999|3emQ49UZtlfjeK
> at
>
> org.apache.flink.api.common.io.GenericCsvInputFormat.parseRecord(GenericCsvInputFormat.java:283)
> at
>
> org.apache.flink.api.java.io.CsvInputFormat.readRecord(CsvInputFormat.java:236)
> at
>
> org.apache.flink.api.java.io.CsvInputFormat.readRecord(CsvInputFormat.java:42)
> at
>
> org.apache.flink.api.common.io.DelimitedInputFormat.nextRecord(DelimitedInputFormat.java:489)
> at
>
> org.apache.flink.api.java.io.CsvInputFormat.nextRecord(CsvInputFormat.java:204)
> at
>
> org.apache.flink.api.java.io.CsvInputFormat.nextRecord(CsvInputFormat.java:42)
> at
>
> org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:148)
> at
>
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:257)
> at java.lang.Thread.run(Thread.java:724)
>
>
> ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
>
> DataSet<Customer> customers = getCustomerDataSet(env,mask);
>
> DataSet<Orders> orders= getOrdersDataSet(env,maskorder);
>
> DefaultJoin<Customer,Orders> result =
> customers.join(orders).where(0).equalTo(1);
>
> result.print();
> result .writeAsCsv("/home/hadoop/Desktop/Dataset/inoptmization.csv", "\n",
> "|",WriteMode.OVERWRITE);
> env.execute();
> }
> public static class Customer extends Tuple3<Long,String,String> {
>
> }
> public static class Orders extends Tuple2<Long,Long> {
>
> }
> private static DataSet<Customer> getCustomerDataSet(ExecutionEnvironment
> env,String mask) {
> return env.readCsvFile("/home/hadoop/Desktop/Dataset/customer.csv")
> .fieldDelimiter('|')
> .includeFields(mask).ignoreFirstLine()
> .tupleType(Customer.class);
>
>
> }
>
>
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/when-run-progrm-on-big-data-customer-2-5GB-orders-5GB-disply-error-why-tp1699.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>

Re: when run progrm on big data customer 2.5GB orders 5GB disply error why

Reply via email to