Hi Yiannis, I've resolved the issue when I've run the code on bigger set of data. I will try to post the code when I polish it a bit. The partitions should be sorted with KeyValue sorter before bulkSaving them.
2015-06-16 15:10 GMT+02:00 Yiannis Gkoufas <[email protected]>: > Hi, > > didn't realize that I only sent to Dawid. > Resending to the entire list in case someone else has encountered this > error before: > > 15/06/10 23:45:16 WARN TaskSetManager: Lost task 34.48 in stage 0.0 (TID > 816, iriclusnd20): java.io.IOException: Added a key not lexically larger > than previous > key=\x00\x17\x083661310846GMP\x00\x00\x00\x01E\xF3jH@\x010GEN\x00\x00\x01M\xDF\xA6!\xFF\x04, > lastkey=\x00\x17\x1E7359530994GMP\x00\x00\x00\x01@ > \xD4\xFE\xC0\xC0\x010_0\x00\x00\x01M\xDF\xA6!\xFF\x04 > at > org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:202) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:288) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:253) > at > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:935) > at > org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2$1.write(HFileOutputFormat2.java:196) > at > org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2$1.write(HFileOutputFormat2.java:149) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:1000) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979) > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:64) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > I get the above error multiple times. > The HDFS path is fine, there is no error about that. > > Thanks! > > On 11 June 2015 at 17:49, Dawid <[email protected]> wrote: > >> Hi, >> Your code seems ok to me, the only difference with what I do is that I >> explicitly pass hdfs path to bulkSave, I am not sure how "/bulk is resolved. >> I am very beginner with spark, hbase, phoenix etc. but if you'd like to >> use this code I could try to investigate your problem, but I need the full >> stack trace. >> >> >> >> On 11.06.2015 00:53, Yiannis Gkoufas wrote: >> >> Hi Dawid, >> >> yes I have been using your code. Probably I am invoking the classes in a >> wrong way. >> >> val data = readings.map(e => e.split(",")) >> .map(e => (e(0),e(1).toLong,e(2).toDouble,e(3).toDouble)) >> val tableName = "TABLE";val columns = Seq("SMID","DT","US","GEN");val zkUrl >> = Some("localhost:2181"); >> val functions = new ExtendedProductRDDFunctions(data);val hfiles = >> functions.toHFile(tableName,columns,new Configuration,zkUrl);val loader = >> new BulkPhoenixLoader(hfiles); >> loader.bulkSave(tableName,"/bulk",None); >> >> >> Does the above seem the correct way to you? >> >> >> Thanks a lot! >> >> >> On 10 June 2015 at 19:13, Dawid <[email protected]> wrote: >> >>> Thx a lot James. That's the case. >>> >>> >>> >>> On 10.06.2015 19:50, James Taylor wrote: >>> >>>> David, >>>> It might be timestamp related. Check the timestamp of the rows/cells >>>> you imported from the HBase shell. Are the timestamps later than the >>>> server timestamp? In that case, you wouldn't see that data. If this is >>>> the case, you can try specifying the CURRENT_SCN property at >>>> connection time with a timestamp later than the timestamp of the >>>> rows/cells to verify. >>>> Thanks, >>>> James >>>> >>>> On Wed, Jun 10, 2015 at 10:14 AM, Dawid <[email protected]> >>>> wrote: >>>> >>>>> Yes, that's right I have generated HFile's that I managed to load so >>>>> to be >>>>> visible in HBase. I can't make them 'visible' to phoenix. >>>>> >>>>> What I noticed today is I have rows loaded from the generated HFiles >>>>> and >>>>> upserted through sqlline when I run 'DELETE FROM TABLE' only the >>>>> upserted >>>>> one disappears. The loaded from HFiles still persist in HBase. >>>>> >>>>> Yiannis how do you generate the HFiles? You can see my code here: >>>>> https://gist.github.com/dawidwys/3aba8ba618140756da7c >>>>> >>>>> >>>>> On 10.06.2015 17:57, Yiannis Gkoufas wrote: >>>>> >>>>> Hi Dawid, >>>>> >>>>> I am trying to do the same thing but I hit a wall while writing the >>>>> Hfiles >>>>> getting the following error: >>>>> >>>>> java.io.IOException: Added a key not lexically larger than previous >>>>> >>>>> key=\x00\x168675230967GMP\x00\x00\x00\x01=\xF4h)\xE0\x010GEN\x00\x00\x01M\xDE.\xB4T\x04, >>>>> >>>>> lastkey=\x00\x168675230967GMP\x00\x00\x00\x01=\xF5\x0C\xF5`\x010_0\x00\x00\x01M\xDE.\xB4T\x04 >>>>> >>>>> You have reached the point where you are generating the HFiles, >>>>> loading them >>>>> but you dont see any rows in the table? >>>>> Is that correct? >>>>> >>>>> Thanks >>>>> >>>>> >>>>> On 8 June 2015 at 18:09, Dawid <[email protected]> wrote: >>>>> >>>>>> >>>>>> Yes, I did. I also tried to execute some upserts using sqlline after >>>>>> importing HFiles, and rows from upserts are visible both in sqlline >>>>>> and >>>>>> hbase shell, but >>>>>> the rows imported from HFile are only in hbase shell. >>>>>> >>>>>> >>>>>> On 08.06.2015 19:06, James Taylor wrote: >>>>>> >>>>>>> Dawid, >>>>>>> Perhaps a dumb question, but did you execute a CREATE TABLE statement >>>>>>> in sqlline for the tables you're importing into? Phoenix needs to be >>>>>>> told the schema of the table (i.e. it's not enough to just create the >>>>>>> table in HBase). >>>>>>> Thanks, >>>>>>> James >>>>>>> >>>>>>> On Mon, Jun 8, 2015 at 10:02 AM, Dawid <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Any suggestions? Some clues what to check? >>>>>>>> >>>>>>>> >>>>>>>> On 05.06.2015 23:21, Dawid wrote: >>>>>>>> >>>>>>>> Yes I can see it in hbase-shell. >>>>>>>> >>>>>>>> Sorry for the bad links, i haven't used private repositories on >>>>>>>> github. >>>>>>>> So I >>>>>>>> moved the files to a gist: >>>>>>>> https://gist.github.com/dawidwys/3aba8ba618140756da7c >>>>>>>> Hope this times it will work. >>>>>>>> >>>>>>>> On 05.06.2015 23:09, Ravi Kiran wrote: >>>>>>>> >>>>>>>> Hi Dawid, >>>>>>>> Do you see the data when you run a simple scan or count of the >>>>>>>> table >>>>>>>> in >>>>>>>> Hbase shell ? >>>>>>>> >>>>>>>> FYI. The links lead me to a 404 : File not found. >>>>>>>> >>>>>>>> Regards >>>>>>>> Ravi >>>>>>>> >>>>>>>> On Fri, Jun 5, 2015 at 1:17 PM, Dawid <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> I was trying to code some utilities to bulk load data through >>>>>>>>> HFiles >>>>>>>>> from >>>>>>>>> Spark RDDs. >>>>>>>>> I was trying to took the pattern of CSVBulkLoadTool. I managed to >>>>>>>>> generate >>>>>>>>> some HFiles and load them into HBase, but i can't see the rows >>>>>>>>> using >>>>>>>>> sqlline. I would be more than grateful for any suggestions. >>>>>>>>> >>>>>>>>> The classes can be accessed at: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> https://github.com/dawidwys/gate/blob/master/src/main/scala/pl/edu/pw/elka/phoenix/BulkPhoenixLoader.scala >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> https://github.com/dawidwys/gate/blob/master/src/main/scala/pl/edu/pw/elka/phoenix/ExtendedProductRDDFunctions.scala >>>>>>>>> >>>>>>>>> Thanks in advance >>>>>>>>> >>>>>>>>> Dawid Wysakowicz >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>> Pozdrawiam >>>>>>>> Dawid >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Pozdrawiam >>>>>>>> Dawid >>>>>>>> >>>>>>> >>>>>> -- >>>>>> Pozdrawiam >>>>>> Dawid >>>>>> >>>>>> >>>>> -- >>>>> Pozdrawiam >>>>> Dawid >>>>> >>>> >>> -- >>> Pozdrawiam >>> Dawid >>> >>> >> >> -- >> Pozdrawiam >> Dawid >> >> >
