Re: Importing Postgres-generated CSVs into Phoenix

Ken Hampson Fri, 08 Jan 2016 19:23:17 -0800

Sure, no problem. I've added a comment indicating my interest to SQOOP-2649.


Thanks,
- Ken

On Fri, Jan 8, 2016 at 8:33 PM James Taylor <[email protected]> wrote:

> Sounds good, Ken. If you could comment on SQOOP-2649 that you'd be
> interested in using that functionality if it were available, that'd be much
> appreciated.
>
> Thanks,
> James
>
> On Fri, Jan 8, 2016 at 5:04 PM, Ken Hampson <[email protected]> wrote:
>
>> Hi, James:
>>
>> Thanks for the reply. I took a look at the patch attached to
>> PHOENIX-2434, and it does look like it has a good chance to solve the
>> problem, as Postgres exports boolean values as 't' and 'f', which the patch
>> now checks for. Once we're able to get to Phoenix 4.7.0, I'll definitely
>> try it out to confirm.
>>
>> We don't and won't be able to use Flyway due to various architectural
>> considerations, but we do use Sqoop to get data from Postgres into Hive. I
>> had skimmed  SQOOP-2649 when originally looking at how to bulk load into
>> Phoenix, and figured CSV was the way to go for now since the  SQOOP-2649
>> enhancement was still in patch form. It's definitely something I will keep
>> an eye on going forward.
>>
>> Thanks again,
>> - Ken
>>
>>
>> On Fri, Jan 8, 2016 at 1:13 PM James Taylor <[email protected]>
>> wrote:
>>
>>> Hi Ken,
>>> PHOENIX-2434 improved our CSV handling of booleans and will appear in
>>> our upcoming 4.7.0 release. It'd be good if you can confirm whether or not
>>> this is what you need. We definitely want to support ingest of CSVs from
>>> other RDBMSs.
>>>
>>> There are a couple of other avenues of ingest into Phoenix:
>>> - using Flyway DB (http://flywaydb.org/) where Phoenix support has been
>>> added (not sure if this has appeared in a release yet or not, though)
>>> - through a pull request to add support for Phoenix in Scoop (
>>> http://sqoop.apache.org/). Unfortunately, this isn't getting much
>>> attention in that community (if this is useful for you, you may want to
>>> chime in on SQOOP-2649 or on their mailing list).
>>>
>>> Thanks,
>>> James
>>>
>>> On Thu, Jan 7, 2016 at 5:28 PM, Ken Hampson <[email protected]> wrote:
>>>
>>>> Hi, all:
>>>>
>>>> I am in the process of importing some data from Postgres into Phoenix
>>>> as part of a larger project. I noticed after doing so and cross-checking
>>>> the results of a fairly complex JOIN query (of 6 tables of so) that the
>>>> number of rows returned by Postgres and Phoenix differed -- Phoenix had a
>>>> significant number more rows. I investigated this further, and ultimately
>>>> determined that the reason was that all *boolean *values were imported
>>>> as being false, regardless of their actual value.
>>>>
>>>> When I investigated that aspect further, I discovered that Phoenix
>>>> appears to require the full strings "true" and "false" in the CSV in order
>>>> to properly register their actual values. As a workaround, I am currently
>>>> using a full SELECT in the "\copy" statement in psql to select each column
>>>> individually, and for each boolean column, I am using a CASE statement to
>>>> emit the full strings. I have since also learned that casting the boolean
>>>> to the "text" datatype will also cause Postgres to emit the full string.
>>>>
>>>> In either case, though, the pain point is that this discrepancy
>>>> requires that I examine each table closely for boolean columns, and if
>>>> there are any, I then need to enumerate all the columns out explicitly and
>>>> use either the CASE statement or the cast method on the boolean columns in
>>>> order to get the values Phoenix expects. In my case, I didn't need to
>>>> transform any of the data otherwise, so this forced a much more complex
>>>> procedure.
>>>>
>>>> While psql provides many options to change various values -- delimiters
>>>> and what not -- there is no option to change the boolean values. Likewise,
>>>> as near as I can tell, there are no options to change what Phoenix expects,
>>>> either.
>>>>
>>>> I am wondering:
>>>>
>>>> - Is there any way from within Phoenix to change the boolean values
>>>> that it expects in the CSV?
>>>> - Are there any other methods from a Phoenix perspective that you can
>>>> think of that would obviate the need for the column-by-column enumeration
>>>> procedure I outlined above?
>>>> - Since Postgres is a major open source RDBMS which I suspect many
>>>> people -- myself included -- will want to export data from into Phoenix,
>>>> could there be an enhancement to Phoenix to allow an option to specify a
>>>> default boolean value, or perhaps an option to specify the CSV source, from
>>>> which the proper value could be defined and utilized automatically?
>>>>
>>>> Thanks,
>>>> - Ken
>>>>
>>>
>>>
>

Re: Importing Postgres-generated CSVs into Phoenix

Reply via email to