Hi, the problem is that Flink is trying to parse the input data as CSV, but there seem to be rows in the data which do not conform to the specified schema.
On Thu, Jun 18, 2015 at 12:51 PM, hagersaleh <[email protected]> wrote: > when run progrm on big data customer 2.5GB orders 5GB disply error why > > > DataSource (at getCustomerDataSet(TPCHQuery3.java:252) > (org.apache.flink.api.java.io.CsvInputFormat)) (1/1) switched to FAILED > org.apache.flink.api.common.io.ParseException: Row too short: > 14999999|Customer#014999999|3emQ49UZtlfjeK > at > > org.apache.flink.api.common.io.GenericCsvInputFormat.parseRecord(GenericCsvInputFormat.java:283) > at > > org.apache.flink.api.java.io.CsvInputFormat.readRecord(CsvInputFormat.java:236) > at > > org.apache.flink.api.java.io.CsvInputFormat.readRecord(CsvInputFormat.java:42) > at > > org.apache.flink.api.common.io.DelimitedInputFormat.nextRecord(DelimitedInputFormat.java:489) > at > > org.apache.flink.api.java.io.CsvInputFormat.nextRecord(CsvInputFormat.java:204) > at > > org.apache.flink.api.java.io.CsvInputFormat.nextRecord(CsvInputFormat.java:42) > at > > org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:148) > at > > org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:257) > at java.lang.Thread.run(Thread.java:724) > > > ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); > > DataSet<Customer> customers = getCustomerDataSet(env,mask); > > DataSet<Orders> orders= getOrdersDataSet(env,maskorder); > > DefaultJoin<Customer,Orders> result = > customers.join(orders).where(0).equalTo(1); > > result.print(); > result .writeAsCsv("/home/hadoop/Desktop/Dataset/inoptmization.csv", "\n", > "|",WriteMode.OVERWRITE); > env.execute(); > } > public static class Customer extends Tuple3<Long,String,String> { > > } > public static class Orders extends Tuple2<Long,Long> { > > } > private static DataSet<Customer> getCustomerDataSet(ExecutionEnvironment > env,String mask) { > return env.readCsvFile("/home/hadoop/Desktop/Dataset/customer.csv") > .fieldDelimiter('|') > .includeFields(mask).ignoreFirstLine() > .tupleType(Customer.class); > > > } > > > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/when-run-progrm-on-big-data-customer-2-5GB-orders-5GB-disply-error-why-tp1699.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >
